Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Spam messages recognizing method based on word embedding and convolutional neural network
LAI Wenhui, QIAO Yupeng
Journal of Computer Applications    2018, 38 (9): 2469-2476.   DOI: 10.11772/j.issn.1001-9081.2018030643
Abstract1014)      PDF (1380KB)(785)       Save
It is of great social value and times background significance to filter and recognize spam messages. Traditional artificially designed feature selection methods may lead to data sparseness, insufficient co-occurrence of feature information and difficulty in feature extraction. To solve above problems, a spam messages recognizing method based on word embedding and convolutional neural network was proposed. Firstly, word2vec's skip-gram model was used to train the word embedding of each word in the short message dataset according to the Wiki Chinese corpus, and the two-dimensional feature matrix representing short message was composed of word embedding of each word in a short message. Then, the feature matrix was used as the input to the convolutional neural network. The multi-scale short message features were extracted by using different scale convolution kernels of the convolution layer, and the 1-max pooling strategy was used to obtain the local optimal features. Finally, the fusion feature vector, composed of the local optimal features, was put into the softmax classifier to get the classification results. Experiments were performed on 100000 short messages. The experimental results show that the recognition accuracy based on the convolutional neural network model can reach 99.5%, which is 2.4% to 5.1% higher than that of the traditional machine learning models with the same feature extraction method, and the recognition accuracy of each model maintains above 94%.
Reference | Related Articles | Metrics